Market Location Analysis

Abstract

Using Geo-segmentation, the researcher evaluates the market potential for each store location of a retail chain in Bergen area, Norway, by implementing delaunay triangulation algorithm. In term of practicality, a comparison between two R packages for the algorithm has been done. Furthermore, the researcher documented the steps for using google places API. Limitations of the research have been discussed.

1.Introduction

A market area is a part of the earth’s surface where the current or potential customers of a supply location come from. This supply location can provide goods such as; retailers or services such as; hospitals and telecommunication towers (Berman and Evans, 2013).

The market area of a store/location can be regarded as the spatial equivalent to its sum of all customers. The total number of a supply’s customers can be determined by summing customer flows from each geographic region in its market area (Huff and McCallum, 2008).

In case of retailers. A market area is influenced by many factors such as the transportation costs (e.g. distance, travel time) between customers and locations, and the characteristics of the competitors (Wieland, 2017).

Market area analysis can be used in retail location analysis to find new locations or to evaluate the existing ones (Berman and Evans, 2013). In this project, the researcher did descriptive analysis in order to evaluate the market potentail for stores’ locations of “Bunnpris”, a retail chain in Norway. The focus is in Bergen area.

2.Research Question

By focusing on the retail sector, the research chooses a brand and tried to see if existing supply locations over or under exploit the market potential in each shop’s coverage area. The market potential is assumed to be the total number of population on that area. More precisely, the research question can be formulated as:

Is there a relation between retail locations and population in it’s area?

Thus, the answer for this question is descriptive rather than normative.

3.Motivation and contribution

The aim of this project is mostly practical to strength the researcher analytical skills regarding spatial data analysis. In addition, the research documents the best practices regarding some spatial data operations which are relevant to this project. For example, using google places API service and a comparison between “delider” and “dismo” R packages regarding implementing Delaunay Triangulation.

4.Data Sources and Cleaning

4.1.Shapefiles

#required Libraries 
library(sf)
library(sp)
library(sf)
library(raster)
library(dplyr)
library(spData)
library(knitr)
library(googleway)
library(tmap)
library(kableExtra)
library(ggplot2)
library(DT)
library(data.table)

Dataset1: Grid of Norway map consisting of cells where each cell is 1 x 1 km^2 downloaded from statistics bureau of Norway. The dataset Can be downloaded from this link.

ssb1km <- st_read(dsn = "~/Downloads/nhh/spatial data analysis/project/ssb001k" , layer = "ssb1km")
Reading layer `ssb1km' from data source `/home/sherif/Downloads/nhh/spatial data analysis/project/ssb001k' using driver `ESRI Shapefile'
Simple feature collection with 533918 features and 6 fields
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: -106000 ymin: 6419000 xmax: 1131000 ymax: 7971000
epsg (SRID):    32633
proj4string:    +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
SSBID RSIZE ROW COL XCOOR YCOOR geometry
20230006419000 1000 130 1 23500 6419500 list(c(23000, 23000, 24000, 24000, 23000, 6419000, 6420000, 6420000, 6419000, 6419000))
20240006419000 1000 131 1 24500 6419500 list(c(24000, 24000, 25000, 25000, 24000, 6419000, 6420000, 6420000, 6419000, 6419000))
20250006419000 1000 132 1 25500 6419500 list(c(25000, 25000, 26000, 26000, 25000, 6419000, 6420000, 6420000, 6419000, 6419000))
20260006419000 1000 133 1 26500 6419500 list(c(26000, 26000, 27000, 27000, 26000, 6419000, 6420000, 6420000, 6419000, 6419000))
20270006419000 1000 134 1 27500 6419500 list(c(27000, 27000, 28000, 28000, 27000, 6419000, 6420000, 6420000, 6419000, 6419000))

The variable SSBID is the ID for each cell in the grid.

Dataset 2: Since the the previous file does not have names of counties or administrative boundaries of norway, and becuase the focus of this project is on Bergen area, the administrative boundaries geo-data of Norway has been downloaded from www.gadm.org.

adm <- read_sf( dsn = "~/Downloads/nhh/spatial data analysis/project/NOR_adm" , layer = "NOR_adm2")

Checking Administrative boundaries data file.

DT::datatable(adm)

Since Bergen area is the one of interest, it has to be separated from the data.

Bergen = adm %>% filter(NAME_2 == "Bergen")

Setting Bergen administrative boundaries data to be the same CRS as the grid data.

Bergen <- st_transform(Bergen, crs="+proj=utm +zone=33 +datum=WGS84 +units=m +no_defs")

Next, the administrative boundaries data of Bergen and the grid data, had to be joined together, so we can have the grid cells of Bergen only.

#Intersect layers
ssb1kmBergen<- st_intersection(ssb1km, Bergen)

Plotting the results.

#plot ssbid
tmap_mode("plot")
tm_shape(ssb1kmBergen) + tm_polygons() + 
tm_credits("Bergen Grid", position=c("right", "bottom"),size = 1.5)

Dataset 3: The total number of population for each cell can be fetched from here.

#\Reading population data 
population = read.csv("Ruter1000m_beflandet_2018.csv" , sep=";", header=TRUE)

Checking population data file.

knitr::kable(head(population,n = 5))
ssbid_1000m pop_tot pop_mal pop_fem pop_ave
1.925001e+13 11 7 4 49,5
1.925001e+13 135 71 64 38,4
1.925001e+13 33 17 16 50,8
1.926001e+13 4 2 2 0,0
1.926001e+13 25 13 12 38,7
#DT::datatable(population,n = 5)

Variables of interest in this dataset are; SSBID which is the ID for each grid’s cell and “pop_tot” which is the total population in each cell.

To Merge the population data with the grid data, some joining operations had to be done.

#changing the class of cells' ID in both to be charachter 

population$ssbid_1000m= as.character(population$ssbid_1000m)
ssb1kmBergen$SSBID= as.character(ssb1kmBergen$SSBID)

#Joining population data with the shapefile
Bergen_popluation = left_join(ssb1kmBergen, population, by = c("SSBID"= "ssbid_1000m"))

So now, we have a final file called “Bergen_populaiton”, We can visualize its information as follows:

tmap_mode("view")
tm_shape(Bergen_popluation)+ tm_fill(col ="pop_tot",breaks = c(0, 1 ,1000,2000,3000,4000, 5000, 6000, 7000, 8000))
#+  tm_credits("Bergen population for each cell", position=c("right", "bottom"),size = 1.5)

As we see from the map, there are some missing numbers of total population for some grid cells, however, my interpretation that these are unpopulated areas so they can be converted to be zero instead of NA.

Bergen_popluation$pop_tot[is.na(Bergen_popluation$pop_tot)] <- 0

4.2.Bunnpris Geo- coded locations using Google API

After finishing with having geo-data of Bergen area, the next step was to find a good brand candidate as a retailer for this project. “Bunnpris” retail chain has been chosen since it has a reasonable number of branches across Bergen, not so few so we can not have any insights nor too many so it can complicate the mapping.

For retreving the geo-location of Bunnrpis’ branches across Bergen, Google Places API had to be used. For using this service, an API key has to be generated for the use. Instructions for getting the api can be found on this page.

For handling the google service through R interface, googleway library was installed. The following steps were done:

1- Using set_key to make API keys available for all the google_ functions, so there will be no need to specify the key parameter within those functions

# set_key(key = "xxx xxx xxx)

2- Searching for Bunnpris near Bergen area

# bunnpris <- google_places(search_string = "bunnpris near Bergen")

Searching by the string “near bergen”" gives better results than “bergen,hordaland” or“,bergen”.

This function returned a nested list which one of its components is a dataframe contains addresses, geolocations of bunnpris market in Bergen.

Googleway Library also allows for searching with a specific radius around certain centriod.

#bunnpris <- google_places(location = c(60.3913, 5.3221), keyword = "bunnpris", radius = 5000)

Howeve, this function did not give better results than the previous one.

3- As aforementioned, the output of the search string is a multiple nested list. Therefore, a series of operations had to be done in order to get the geocoded locations separately.

#bunnpris.results = bunnpris$results

#bunnpris.geometery= bunnpris.results$geometry
#bunnpris.geocode = bunnpris.geometery$location
#bunnpris.geocode$adress = bunnpris.results$formatted_address
## adding the adress column 

Now, we have a dataframe contains the address and geo-coded location of 18 Bunnpris shops on the area of Bergen. According to Bunnpris website,there are only 16 shops on Bergen area may be they have different definitions of Bergen administrative frontiers.

bunnpris.geocode = read.csv("bunnpris.geocode.csv")
knitr::kable(head(bunnpris.geocode,n = 5))
X lat lng adress
1 60.38507 5.331579 Nygårdsgaten 89, 5008 Bergen, Norway
2 60.39135 5.322377 Torggaten 7, 5014 Bergen, Norway
3 60.37111 5.367897 Lægdesvingen 61, 5096 Bergen, Norway
4 60.37869 5.353672 Møllendalsveien 61B, 5009 Bergen, Norway
5 60.39311 5.314620 Nøstegaten 52, 5011 Bergen, Norway

4- Converting the dataframe into a spatial object.

bunnpris_sf_longlat <- bunnpris.geocode %>% 
   st_as_sf(coords = c('lng', 'lat'), crs = "+proj=longlat +datum=WGS84")



#converting the CRS to the one similar to our base data

bunnpris_sf <- st_transform(bunnpris_sf_longlat, crs="+proj=utm +zone=33 +datum=WGS84 +units=m +no_defs")

Now we can see on the map where are Bunnpris branches.

tmap_mode("view")
tm_shape(Bergen_popluation)+ tm_fill(col ="pop_tot",breaks = c(0, 1 ,1000,2000,3000,4000, 5000, 6000, 7000, 8000)) + tm_shape(bunnpris_sf) + tm_symbols(shape=20,size =.1,col = "green" )  + tm_legend(outside = TRUE, outside.position = "bottom", stack = "horizontal")

5.Descriptive Analysis

As mentioned, The market area of a store can be considered as the spatial equivalent to its sum of all customers. In our case, we assume that a person in Bergen’s population is a potential customer for only his/her nearest Bunnpris store. Therefore, the nearest store for each cell on Bergen’s grid had to be determined. Then, the total number of population in the areas which are closest to each store would be considered as its targeted “potential” Market. The Algorithm which is suggested to do this proximity analysis is called “Delaunay triangulation”.

5.1.Proximity Analysis

In R there are mainly two packages found to implement Delaunay triangulation, “deldir” and “Dismo”. A comparison between the two packages is done and the output is used for answering the research question.

5.1.1.Deldir package

 coor=as.data.frame(st_coordinates(bunnpris_sf_longlat))
#st_coordinates: retrieve coordinates in dataframe form

require(deldir)

# Defining the coordinates 
x <- coor$X
y <- coor$Y

# Calculate the Delaunay triangulation, then the tiles.
z <- deldir(x,y)
w <- tile.list(z)


# Make a list of colours, and use 'em to plot:
ccc <- terrain.colors(18) 
plot(w,fillcol=ccc,close=TRUE)

Comments on deldier package

1-From our experimentation, it works only with longlat projection, did not work with UTM coordinates’ numbers

2- The tile.list output is a deldir object which is found to be hard for converting it to a sp object so it can be used for further analysis

5.1.2.Dismo package

library('dismo')

#Getting the coordinates
coor=as.data.frame(st_coordinates(bunnpris_sf))

#Converting x and y coordinates into a matrix
x <- coor$X
y <- coor$Y
points <- matrix(c(x,y), ncol=2)


#Implementing the triangulation and setting the boundaries as min and max of x and y coordinates of Bergen area
vor <- voronoi(points,ext = c(-40733.07,-10356.25,6714213,6749144) )


#plotting
spplot(vor, "id")

#Checking the class of the returned object 
class(vor)
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"
Comments on dismo package

1-From our experimentation, it works with both longlat projection and UTM coordinates’ numbers.

2-The output is a sp object which make it easier to be used for further analysis.

5.2.Market Geo-segmentation

In this section, series of operations are implemented in order to calculate the total number of potential customer “population” in each tringulated area for each store.

Now, the triangulated areas can be plotted on Bergen map.

tmap_mode("view")
tm_shape(Bergen_popluation)+ tm_fill(col ="pop_tot" , alpha = .8 , n = 10) + tm_shape(bunnpris_sf) + tm_symbols(shape=1,size =.3,col = "red" )  + tm_shape(vor_sf) +tm_borders(col = "blue", lwd = 2, lty = "solid")+tm_text("id")

The following chuck of code join the triangulation polygon with Bergen_population data and aggregated the population on each triangulated area.

Bergen_popluation_tringulated = Bergen_popluation %>%
  st_join(vor_sf) %>%
  group_by(id) %>%
  summarize(total_population_for_each_store = sum(pop_tot, na.rm = TRUE))

Visualizing the distribution of potential market for the 18 locations.

On the left graph, it can be seen the total number of potential customers for each store labeled by its ID. The vertical blue line is the mean of potential customers across all the the stores. Assuming that the mean is the standard population number that each store should serve. We can see that there are some areas which are under-served such as; 14,15 and 18. Since these areas are not on the center of city, that gives an opportunity for establishing new stores. Areas on the city-center such as 1,2,3 and 4 can be considered as over-served. However, it has to be taken into consideration that there is inflow “mobility” of people to the center and potential customers are not only the people who are living on these areas, but also people who come to work or visit the center everyday and they are not represented on the data.

6.Limitations

As mentioned in the introduction, there are many factors that affects the competitiveness of retails stores location. To simplify our analysis, we have only taken into considerations, the total number of people who live near to a store location as a potential market. However, other factors that should be taken into consideration to advance this research in the future could be:
* The daily inflow of people to the coverage area of a store
* Accessibility and convenience of a store in term of being easy to reach by customers
* The effect of existing competing stores in the nearby area

7.Summary

In order to evaluate existing locations of Bunnpris stores in Bergen area Norway, we tried to see if there a relationship between each store and its market potential. The total number of population on each store’s coverage area was considered as a proxy for market potential. To define the coverage area for eachs store, delaunay triangulation algorithm has been implemented. We found that Disom R package on is more practical than Deldir for implementing delaunay triangulation. It has been noticed that the market potential for Bunnpris stores varies, however there are some limitations that should be considered in future research.

8.References

Berman, B., Evans, J. R., & Chatterjee, P. (1995). Retail management: a strategic approach .

Huff, D., & McCALLUM, B. M. (2008). Calibrating the huff model using ArcGIS business analyst. ESRI White Paper .

Wieland, T. (2017). Market Area Analysis for Retail and Service Locations with MCI. R Journal, 9(1).

9.Bibliography

Cooley, D. (2017). googleway: Accesses Google Maps APIs to Retrieve Data and Plot Maps. R package version, 2(0).

Hijmans, R. J., Phillips, S., Leathwick, J., Elith, J., & Hijmans, M. R. J. (2017). Package ‘dismo’. Circles, 9(1).

Lovelace, R., Nowosad, J., & Muenchow, J. Geocomputation with R.

Turner, R. (2015). Delaunay Triangulation and Dirichlet (Voronoi) Tessellation.

Sherif Ahmed Analytics Consultant

2019-01-18